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We provide a determination of the Gottfried sum from all available data, based on a neural 
network parametrization of the nonsinglet structure function Fi- We find So = 0.244 ±0.045, closer 
to the quark model expectation So = § than previous results. We show that the uncertainty from 
the small x region is somewhat underestimated in previous determinations. 
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Sg(Q 2 ) = I —[F$(x,Q 2 )-F?(x,Q 2 )] (1) 
Jo x 

provides a determination of the light flavor asymmetn 
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Ammetry 

of the nucleon sea. The discovery by the NMC Q, 
that Sq deviates from the simple quark model expecta- 
tion Sg = \ has provided first evidence for an up-down 
asymmetry of the nucleon sea, a finding which has been 
subsequently confirmed in different contexts, is routinely 
included in modern parton fits, and has spawned a large 
theoretical literature 00. Because the scale dependence 
of the Gottfried sum is known up to next-to-next-to lead- 
ing order [6j,l7j its precise determination is potentially in- 
teresting for tests of QCD and the determination of the 
strong coupling. 

The experimental determination of a sum rule, and 
especially of the associated uncertainty, is nontrivial be- 
cause structure function data are only available at dis- 
crete values of x and in general not all given at the same 
Q 2 . Therefore, one needs interpolation and extrapolation 
in x in order to cover the full range < x < 1, and ex- 
trapolation in Q 2 in order to bring all data at the same 
Q 2 . Also, it is not obvious how to combine data from 
different experiments without losing information on ex- 
perimental errors and correlations. 

In Refs. H the NNPDF collaboration has pro- 
posed a method for the parametrization of structure func- 
tions and parton distributions, and has constructed a 
parametrization of the proton, deuteron, and nonsinglet 
pNS = pv _ pn structure functions based on all avail- 
able experimental information, including experimental 
and theoretical uncertainties and their correlation. This 
parametrization has been recently used by various au- 
thors 0, 0, as an unbiased interpolation of existing 
data. 

Here, we wish to provide a determination of the Got- 
tfried sum based on this parametrization, which, in the 
nonsinglet case, relevant for the Gottfried sum, is based 
on the structure function data from the NMC Qjj as well 
as those from the BCDMS collaboration which 
are rather more precise and cover a different kinematic 
region (see Fig.QJ. 

The parametrization of Refs. 0, El provides a Monte 
Carlo sample of replicas of the structure function for all 



x and Q 2 , so the Gottfried sum and associated error can 
be straightforwardly determined by integrating over x at 
fixed Q 2 , and averaging over the sample. The error on 
the parametrization blows up when extrapolating out- 
side the measured region, so that the region where re- 
liable predictions are obtained can be inferred from the 
parametrization itself. 

First, we compare to the result of Refs. 0,0 , where 
the contribution to the Gottfried integral Eq. from 
the measured region 0.004 < x < 0.8 at Q 2 — 4 GeV 2 
is determined to be S G (0.004 < x < 0.8,4 GeV 2 ) = 
0.2281 ±0.0065 (stat.) ± 0.019 (sysO = 0.2281 ± 0.020. 
The previous determination of Ref. Q, based on about 
half the statistics, had S G (0.004 < x < 0.8,4 GeV 2 ) = 
0.221 ± 0.008 (stat.) ± 0.019 (syst.) = 0.2281 ± 0.021. 
Using the neural parametrization of Ref. 0] we get 

50(0.004 < x < 0.8, 4 GeV 2 ) = 0.2281 ± 0.0437, (2) 

where the error includes statistical and (correlated) 
systematic uncertainties which are combined in the 
parametrization of Ref. (NNPDF result, henceforth). 
Despite the (accidental) perfect agreement with NMC of 
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the central value of the NNPDF result Eq. the un- 
certainty is more than twice as large. This is suprizing, 
in view of the fact the data used in neural parametriza- 
tion include both NMC and BCDMS, and cover a wider 
kinematic region. 

In fact, the dataset used by NMC in their determina- 
tion of the F2/F2 ratio which is used to compute the 
Gottfried sum in Ref. Q is about four times as large as 
that used by the same collaboration in their structure 
function determination of Ref. |l3j which was used to 
construct the neural network parametrization p|, essen- 
tially because the detrmination of a structure function 
ratio allows more generous cuts than the absolute deter- 
mination of the structure function. However, it is unclear 
whether this can explain the higher precision of the deter- 
mination of Ref. |3| , given that the error is dominated by 
systematics, and the determination of Sg requires any- 
way knowledge of at least one structure function on top 
of the structure function ratio. 

TABLE I: The contribution to the Gottfried sum at Q 2 = 
4 GeV 2 from the region x m i n < x < 0.8 as obtained by 
NMC 2] and with neural networks. The error is only sta- 
tistical for NMC, while it is the total combined statistical 
and systematic uncertainty for NNPDF. The total NMC sys- 
tematics on Sg (0.004 < x < 0.8) is equal to 0.019. 



X min 


S G (x mi n <X< 0.8) 




NMC 


NNPDF 


0.004 


0.221 ±0.008 


0.2281 ± 0.0437 


0.010 


0.213 ±0.005 


0.2378 ± 0.0273 


0.020 


0.203 ± 0.004 


0.2334 ± 0.0232 


0.040 


0.183 ±0.004 


0.2157 ±0.0217 


0.060 


0.171 ±0.003 


0.1985 ± 0.0202 


0.100 


0.149 ±0.003 


0.1693 ±0.0169 


0.150 


0.125 ±0.003 


0.1398 ±0.0133 


0.200 


0.107 ±0.003 


0.1154 ±0.0107 


0.300 


0.074 ± 0.003 


0.0761 ± 0.0074 


0.400 


0.047 ± 0.002 


0.0460 ± 0.0052 


0.500 


0.025 ± 0.002 


0.0241 ± 0.0035 


0.600 


0.012 ± 0.002 


0.0102 ±0.0019 



In order to understand this state of affairs, in Ta- 
ble [I] we compare the contribution to the Gottfried in- 
tegral Eq. |Q from the measured experimental region 
a^min < x < 0.8 |2( with that obtained from neural 
networks. The NNPDF determination uncertainty in- 
cludes statistical and correlated systematic errors, while 
the NMC experimental result Ref. Q only determines 
the overall systematic uncertainty. Combining the to- 
tal NMC systematics (which is highly correlated between 
bins) with the statistical error of Table H| the NNPDF 
and NMC total uncertainties are seen to be in very good 
agreement up to the next-to-smallest x bin. However, 
when the smallest x bin is included the NNPDF uncer- 
tainty almost doubles, while the NMC uncertainty (which 
is dominated by systematic) is essentially unchanged. 

This suggests that the NMC uncertainty from the 
smallest x bin might be underestimated. The reason 



for this is understood by inspection of Fig. |5J where 
the NNPDF and NMC determinations of the nonsinglet 
structure function at Q 2 = 4 GeV 2 are compared. Note 
that the NMC error bars are purely statistical, and that 
the NNPDF error band has high point-to-point correla- 
tion (so the error on Sg is much smaller than the spread 
of the integrals of the one-sigma curves). Note also that 
the NMC data points |2| are obtained by combining their 
determination of F 2 d and of the ratio F^jFSj:, and extrap- 
olating the results at fixed a; to a common Q 2 , while the 
NNPDF results are obtained by interpolating and ex- 
trapolating the full set of NMC and BCDMS F 2 data: 
hence the two determinations should agree within errors, 
but they are not expected to be on top of each other. 

The two determinations are indeed seen to be in good 
agreement. The agreement of the total uncertainty on Sg 
for x ~ 0.01 proves that, once systematics is included, the 
total uncertainties also agree. At the smallest x values, 

£ 0.01, the uncertainty on F2 blows up nonlincarly 
as a function of x, due to the lack of smaller x data 
which could constrain the extrapolation. The NNPDF 
result, which is obtained integrating F 2 N reproduces this 
blowup. The NMC, based on summing over bins (i.e. 
multiplying the value at the bin center by the bin width) 
implicitly assumes that the error is linear across the bin 
and thus underestimates the error on the last bin. 

We conclude that the NMC error on the Gottfried sum 
from the measured region is smaller than the NNPDF er- 
ror Eq. (j2J) entirely due to the contribution of the small- 
est x bin, and that this in turn is largely due to the fact 
that the sum over bin by NMC underestimates the non- 




FIG. 2: The nonsinglet structure function F2 S (x,Q 2 ) as a 
function of x at Q 2 =4 GeV 2 . The solid line is the cen- 
tral value obtained from neural network and the dashed lines 
give the corresponding one-sigma error band (note errors are 
highly correlated betwen different values of x). The experi- 
mental points are from Ref. 0, the error bars are statistical 
only. 



3 



linear grow th of the uncertainty at the edge of the data 
region j^. 

Let us now turn to the best determination of Sq that 
can be obtained from neural networks. To this pur- 
pose, we note that even though in principle the neu- 
ral parametrization of F% provides a value for all x and 
Q 2 , when extrapolating outside the data region this de- 
termination becomes unreliable: the uncertainty grows 
rapidly, but eventually the uncertainty itself is unreli- 
able. Furthermore, whereas the neural nets do satisfy 
the kinematic constraint F2(l, Q 2 ) = 0, they do not sat- 
isfy the theoretical constraint F2(0, Q 2 ) — 0, so the error 
on Sg would diverge if the sum rule were determined 
by simply integrating from to 1. This is as it should 
be, because the x — > region corresponds to the limit of 
infinite energy, and thus it is even in principle experimen- 
tally unaccesible: the associate error is therefore infinite 
unless one makes some theoretical assumption. 

Therefore, we determine the Gottfried sum by inte- 
grating in x at fixed Q 2 for x min < x < 1, and adding to 
this integral a contribution from the small x region de- 
termined by extrapolation. Note that no extrapolation is 
necessary in the large x region, because the coverage of 
the large x region from the BCDMS data together with 
the kinematical constraint at x = 1 are sufficient to pin 
down the structure function with good accuracy at large 

The small x extrapolation requires theoretical assump- 
tions. In Ref. Ilj, it was assumed that F2(x,Q 2 ) ~ Ax b , 

and the constants A and b were determined by fitting to 
the smallest x data. However, the assumption that the 
small-cc power behaviour has already set in in the small- 
est measured x bins does not seem justified. Indeed, in 
the singlet case the small x behaviour observed at HERA 
is not seen in the NMC data and canot be predicted by 
them [sLIT^. Also, on theoretical grounds one would ex- 
pect the asymptotic small x behaviour to set in around 
x sa 10 -3 [T3 . Hence, fitting the small x exponent to the 
data might lead to an underestimate of the uncertainty 
on the small x extrapolation, if the exponent b comes out 
too large. 



TABLE II: Determination of the Gottfried sum with neural 
networks. The scale is given in GeV 2 . The contribution from 
x < Xmi n is obyained by extrapolation and given 100% uncer- 
tainty (see text). 



Q 2 


3- mill 


SG^min < X < 1) 


Sg 


1 


0.007 


0.2566 ± 0.0773 


0.2849 ± 0.0917 


2 


0.005 


0.2522 ±0.0389 


0.2548 ± 0.0494 


3 


0.007 


0.2430 ± 0.0299 


0.2479 ± 0.0454 


4 


0.008 


0.2380 ± 0.0302 


0.2415 ± 0.0477 


5 


0.008 


0.2330 ± 0.0340 


0.2329 ± 0.0507 


10 


0.01 


0.2246 ± 0.0428 


0.2278 ± 0.0627 


30 


0.008 


0.2395 ± 0.0632 


0.2450 ± 0.0860 


1.5-4.5 


0.006 


0.2438 ± 0.0320 


0.2438 ± 0.0449 



ture function at small x displays the behaviour predicted 
by Regge theory 01 F 2 (x,Q 2 ) ~ AJx: even if in ac- 

X-.0 

tual fact this behaviour were to set in at smaller x, we 
would only be miscalculating the contribution to the in- 
tegral from the matching region. Of course, we cannot 
exclude non-Regge behaviour at small x, but if the Got- 
tfried integral Eq. Q exists at all, its integrand is un- 
likely to diverge much stronger than A= at small x. Note 
that the small x behaviour found by fitting to the data 
in Ref. Q is in fact somewhat softer than this, namely 
F 2 (x,Q 2 ) ~ Ax 059 . 



Hence, for all 
(x, Q 2 ), with 



< x n 



we take F2(x,Q 2 ) 



F 2 sx (x,Q 2 )=AVx'. 



(3) 



We fix the normalization coefficient A by matching this 
behaviour to the neural network result at a somewhat 
larger value of x. This enables us to match at a value 
of x which is inside the data region, while only using 
F2 X (x, Q 2 ) Eq. at the edge or outside the data region 
itself. In practice, we match at x = 1.5x m i n ; we have 
checked that results change very little if we move the 
matching point from l.lx m i n to 2x m j n . Matching to the 
neural network determination of -F2(l-5:Emin, Q 2 ) gives us 
a one-sigma error band A = A matc h ± a a- 

The contribution to Sg from x < Xmin is determined 
as the integral of Fl^rr, Q 2 ) computed with A = ^4 ma t c h- 
This is given 100% uncertainty within the one-sigma er- 
ror band of A, namely, the extrapolation error on is taken 
to be equal to the integral of F2 X (x,Q 2 ) computed with 
A = |A matc h| + <?a- The contribution to Sq is then added 
to the contribution from the measured region, while the 
corresponding errors are added in quadrature. 

For each value of Q 2 we can thus find the value of x m - ln 
which minimizes the total uncertainty. There is a tradeoff 
in that if a; m in is raised, the error on the measured region 
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FIG. 3: The value of the Gottfried sum S g (as given in ta- 
ble |nj as a function of the scale. The curve shows the scale 
dependence computed in perturbative QCD at NNLO 0,0 : 
upper curve assuming the quark model value So (00) = i, 
lower curve assuming our best-fit Sg(1 < Q 2 < 5) = .245. 



Therefore, we extrapolate by assuming that the struc- 
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decreases rapidly, but there is an increase in size of the 
small x extrapolation, which is 100% uncertain. The 
results for the Gottfried sum Sg Eq. (QJ, the contribution 
from the measured region, and the value of x m i n which 
minimizes the error are shown in Table UTI 

In Fig. these values are compared to the NNLO per- 
turbative prediction for the scale dependence 0, Q , 
computed with the assumption that as Q 2 — > oo the naive 
quark model value Sg — g is reproduced. This shows 
that, even though the uncertainty in our determination 
is rather larger, and the central value somewhat closer to 
the quark model prediction than the SMC value 0, the 
quark model value and hence flavour symmetry of the 
light quark sea are somewhat disfavoured. 

It is apparent from Fig. [3] that the predicted per- 
turbative dependence of Sg is very slight, and it is in 
fact entirely negligible on the scale of the error on Sq- 
For example, the increase in Sg from Q 2 = 1 GeV 2 to 
Q 2 = 10 GeV 2 due to perturbative evolution is less than 
1%. Hence, we may exploit the fact that neural networks 
retain full information on correlations to combine the de- 
termination of Sg at different values of Q 2 . When corre- 
lations are fully taken into account, this can be done by 
computing Sg at an increasingly large number of values 
°f Qmin — Q 2 — Qmaxi until the result doesn't change 
as the number of values of Q 2 at which Sg is calculated 



in the given interval does not change: because the corre- 
lation of Sq{Q 2 ) and Sg(Q\ tends to one as Q\ — > Q2, 
adding new points eventually stops bringing in new in- 
formation. 

However, because Sg is very highly correlated between 
different values of Q 2 , and the uncertainty increases quite 
fast when the scale is moved to low Q 2 (where data un- 
certainties are larger) or high Q 2 (where there is little 
data coverage at small x), the uncertainty on this aver- 
aged determination is only marginally smaller than any 
of those which we obtained at fixed Q 2 . Optimizing both 
the Q 2 range and the choice of x min we get our best value 



5 G (1.5 < Q 2 < 4.5 GeV 2 ) = 0.244 ± 0.045, (4) 

which can be taken to hold for any Q 2 in the given range. 
The NNLO Q 2 dependence of this result (assumed to 
hold at Q 2 = 3 GeV 2 ) is also displayed in Fig. 03 

A more precise determination of the Gottfried sum will 
only be possible once more data will become available, 
such as those which could be obtained injecting deuterons 
in HERA 0] , from a high-energy upgrade of JLAB 0] , 
or from future facilities, such as the Electron-Ion col- 
lider |2(| or a neutrino factory . 
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